Search CORE

20 research outputs found

Guarantees on learning depth-2 neural networks under a data-poisoning attack

Author: Karmakar Sayar
Mukherjee Anirbit
Papamarkou Theodore
Publication venue
Publication date: 04/05/2020
Field of study

In recent times many state-of-the-art machine learning models have been shown to be fragile to adversarial attacks. In this work we attempt to build our theoretical understanding of adversarially robust learning with neural nets. We demonstrate a specific class of neural networks of finite size and a non-gradient stochastic algorithm which tries to recover the weights of the net generating the realizable true labels in the presence of an oracle doing a bounded amount of malicious additive distortion to the labels. We prove (nearly optimal) trade-offs among the magnitude of the adversarial attack, the accuracy and the confidence achieved by the proposed algorithm.Comment: 11 page

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Size Lowerbounds for Deep Operator Networks

Author: Mukherjee Anirbit
Roy Amartya
Publication venue
Publication date: 11/08/2023
Field of study

Deep Operator Networks are an increasingly popular paradigm for solving regression in infinite dimensions and hence solve families of PDEs in one shot. In this work, we aim to establish a first-of-its-kind data-dependent lowerbound on the size of DeepONets required for them to be able to reduce empirical error on noisy data. In particular, we show that for low training errors to be obtained on

n

data points it is necessary that the common output dimension of the branch and the trunk net be scaling as

\Omega \left ( {\sqrt{n}} \right )

. This inspires our experiments with DeepONets solving the advection-diffusion-reaction PDE, where we demonstrate the possibility that at a fixed model size, to leverage increase in this common output dimension and get monotonic lowering of training error, the size of the training data might necessarily need to scale quadratically with it.Comment: 21 pages, 3 figure

arXiv.org e-Print Archive

A Study of Neural Training with Iterative Non-Gradient Methods

Author: Karmakar Sayar
Mukherjee Anirbit
Publication venue
Publication date: 01/01/2021
Field of study

In this work, we demonstrate provable guarantees on the training of a single ReLU gate in hitherto unexplored regimes. We give a simple iterative stochastic algorithm that can train a ReLU gate in the realizable setting in linear time while using significantly milder conditions on the data distribution than previous such results. Leveraging certain additional moment assumptions, we also show a first-of-its-kind approximate recovery of the true label generating parameters under an (online) data-poisoning attack on the true labels, while training a ReLU gate by the same algorithm. Our guarantee is shown to be nearly optimal in the worst case and its accuracy of recovering the true weight degrades gracefully with increasing probability of attack and its magnitude. For both the realizable and the non-realizable cases as outlined above, our analysis allows for mini-batching and computes how the convergence time scales with the mini-batch size. We corroborate our theorems with simulation results which also bring to light a striking similarity in trajectories between our algorithm and the popular S.G.D. algorithm - for which similar guarantees as here are still unknown.Comment: 25 pages, 5 figures, Accepted to be published in the journal, "Neural Networks

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Global Convergence of SGD On Two Layer Neural Nets

Author: Gopalani Pulkit
Mukherjee Anirbit
Publication venue
Publication date: 08/04/2023
Field of study

In this note we demonstrate provable convergence of SGD to the global minima of appropriately regularized

\ell_2-

empirical risk of depth

2

nets -- for arbitrary data and with any number of gates, if they are using adequately smooth and bounded activations like sigmoid and tanh. We build on the results in [1] and leverage a constant amount of Frobenius norm regularization on the weights, along with sampling of the initial weights from an appropriate distribution. We also give a continuous time SGD convergence result that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence loss functions on constant sized neural nets which are "Villani Functions". [1] Bin Shi, Weijie J. Su, and Michael I. Jordan. On learning rates and schr\"odinger operators, 2020. arXiv:2004.06977Comment: 23 pages, 6 figures. Extended abstract accepted at DeepMath 2022. v2 update: New experiments added in Section 3.2 to study the effect of the regularization value. Statement of Theorem 3.4 about SoftPlus nets has been improve

arXiv.org e-Print Archive

A Study Of The Mathematics Of Deep Learning

Author: Mukherjee Anirbit
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 01/01/2020
Field of study

"Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This ongoing revolution can be said to have been ignited by the iconic 2012 paper from the University of Toronto titled ``ImageNet Classification with Deep Convolutional Neural Networks'' by Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton. This paper showed that deep nets can be used to classify images into meaningful categories with almost human-like accuracies! As of 2020 this approach continues to produce unprecedented performance for an ever widening variety of novel purposes ranging from playing chess to self-driving cars to experimental astrophysics and high-energy physics. But this new found astonishing success of deep neural nets in the last few years has been hinged on an enormous amount of heuristics and it has turned out to be extremely challenging to be mathematically rigorously explainable. In this thesis we take several steps towards building strong theoretical foundations for these new paradigms of deep-learning. Our proofs here can be broadly grouped into three categories, 1. Understanding Neural Function Spaces We show new circuit complexity theorems for deep neural functions over real and Boolean inputs and prove classification theorems about these function spaces which in turn lead to exact algorithms for empirical risk minimization for depth 2 ReLU nets. We also motivate a measure of complexity of neural functions and leverage techniques from polytope geometry to constructively establish the existence of high-complexity neural functions. 2. Understanding Deep Learning Algorithms We give fast iterative stochastic algorithms which can learn near optimal approximations of the true parameters of a \relu gate in the realizable setting. (There are improved versions of this result available in our papers https://arxiv.org/abs/2005.01699 and https://arxiv.org/abs/2005.04211 which are not included in the thesis.) We also establish the first ever (a) mathematical control on the behaviour of noisy gradient descent on a ReLU gate and (b) proofs of convergence of stochastic and deterministic versions of the widely used adaptive gradient deep-learning algorithms, RMSProp and ADAM. This study also includes a first-of-its-kind detailed empirical study of the hyper-parameter values and neural net architectures when these modern algorithms have a significant advantage over classical acceleration based methods. 3. Understanding The Risk Of (Stochastic) Neural Nets We push forward the emergent technology of PAC-Bayesian bounds for the risk of stochastic neural nets to get bounds which are not only empirically smaller than contemporary theories but also demonstrate smaller rates of growth w.r.t increase in width and depth of the net in experimental tests. These critically depend on our novel theorems proving noise resilience of nets. This work also includes an experimental investigation of the geometric properties of the path in weight space that is traced out by the net during the training. This leads us to uncover certain seemingly uniform and surprising geometric properties of this process which can potentially be leveraged into better bounds in future

The University of Manchester - Institutional Repository

JScholarship